{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ColumnSelector"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Implementation of a column selector class for scikit-learn pipelines."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> from mlxtend.feature_selection import ColumnSelector"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `ColumnSelector` can be used for \"manual\" feature selection, e.g., as part of a grid search via a scikit-learn pipeline."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### References\n",
    "\n",
    "-"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 1 - Feature Selection via GridSearch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load a simple benchmark dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_iris\n",
    "\n",
    "iris = load_iris()\n",
    "X = iris.data\n",
    "y = iris.target"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create all possible combinations:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[(0,), (1,), (2,), (3,), (0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3), (0, 1, 2), (0, 1, 3), (0, 2, 3), (1, 2, 3), (0, 1, 2, 3)]\n"
     ]
    }
   ],
   "source": [
    "from itertools import combinations\n",
    "\n",
    "all_comb = []\n",
    "for size in range(1, 5):\n",
    "    all_comb += list(combinations(range(X.shape[1]), r=size))\n",
    "print(all_comb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Feature and model selection via grid search:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Best parameters: {'columnselector__cols': (2, 3), 'kneighborsclassifier__n_neighbors': 1}\n",
      "Best performance: 0.98\n"
     ]
    }
   ],
   "source": [
    "from mlxtend.feature_selection import ColumnSelector\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from sklearn.model_selection import GridSearchCV\n",
    "from sklearn.pipeline import make_pipeline\n",
    "\n",
    "pipe = make_pipeline(StandardScaler(),\n",
    "                     ColumnSelector(),\n",
    "                     KNeighborsClassifier())\n",
    "\n",
    "param_grid = {'columnselector__cols': all_comb,\n",
    "              'kneighborsclassifier__n_neighbors': list(range(1, 11))}\n",
    "\n",
    "grid = GridSearchCV(pipe, param_grid, cv=5, n_jobs=-1)\n",
    "grid.fit(X, y)\n",
    "print('Best parameters:', grid.best_params_)\n",
    "print('Best performance:', grid.best_score_)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "## ColumnSelector\n",
      "\n",
      "*ColumnSelector(cols=None)*\n",
      "\n",
      "Base class for all estimators in scikit-learn\n",
      "\n",
      "**Notes**\n",
      "\n",
      "All estimators should specify all the parameters that can be set\n",
      "at the class level in their ``__init__`` as explicit keyword\n",
      "arguments (no ``*args`` or ``**kwargs``).\n",
      "\n",
      "### Methods\n",
      "\n",
      "<hr>\n",
      "\n",
      "*fit(X, y=None)*\n",
      "\n",
      "Mock method. Does nothing.\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "- `X` : {array-like, sparse matrix}, shape = [n_samples, n_features]\n",
      "\n",
      "    Training vectors, where n_samples is the number of samples and\n",
      "    n_features is the number of features.\n",
      "\n",
      "- `y` : array-like, shape = [n_samples] (default: None)\n",
      "\n",
      "\n",
      "**Returns**\n",
      "\n",
      "self\n",
      "\n",
      "<hr>\n",
      "\n",
      "*fit_transform(X, y=None)*\n",
      "\n",
      "Return a slice of the input array.\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "- `X` : {array-like, sparse matrix}, shape = [n_samples, n_features]\n",
      "\n",
      "    Training vectors, where n_samples is the number of samples and\n",
      "    n_features is the number of features.\n",
      "\n",
      "- `y` : array-like, shape = [n_samples] (default: None)\n",
      "\n",
      "\n",
      "**Returns**\n",
      "\n",
      "- `X_slice` : shape = [n_samples, k_features]\n",
      "\n",
      "    Subset of the feature space where k_features <= n_features\n",
      "\n",
      "<hr>\n",
      "\n",
      "*get_params(deep=True)*\n",
      "\n",
      "Get parameters for this estimator.\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "deep: boolean, optional\n",
      "If True, will return the parameters for this estimator and\n",
      "contained subobjects that are estimators.\n",
      "\n",
      "**Returns**\n",
      "\n",
      "- `params` : mapping of string to any\n",
      "\n",
      "    Parameter names mapped to their values.\n",
      "\n",
      "<hr>\n",
      "\n",
      "*set_params(**params)*\n",
      "\n",
      "Set the parameters of this estimator.\n",
      "\n",
      "The method works on simple estimators as well as on nested objects\n",
      "(such as pipelines). The latter have parameters of the form\n",
      "``<component>__<parameter>`` so that it's possible to update each\n",
      "component of a nested object.\n",
      "\n",
      "**Returns**\n",
      "\n",
      "self\n",
      "\n",
      "<hr>\n",
      "\n",
      "*transform(X, y=None)*\n",
      "\n",
      "Return a slice of the input array.\n",
      "\n",
      "**Parameters**\n",
      "\n",
      "- `X` : {array-like, sparse matrix}, shape = [n_samples, n_features]\n",
      "\n",
      "    Training vectors, where n_samples is the number of samples and\n",
      "    n_features is the number of features.\n",
      "\n",
      "- `y` : array-like, shape = [n_samples] (default: None)\n",
      "\n",
      "\n",
      "**Returns**\n",
      "\n",
      "- `X_slice` : shape = [n_samples, k_features]\n",
      "\n",
      "    Subset of the feature space where k_features <= n_features\n",
      "\n",
      "<br><br>\n"
     ]
    }
   ],
   "source": [
    "with open('../../api_modules/mlxtend.feature_selection/ColumnSelector.md', 'r') as f:\n",
    "    s = f.read() + '<br><br>'\n",
    "print(s)"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
